Skip to content

Instrument recall trace so phase metrics account for total duration (#2361)#2371

Merged
nicoloboschi merged 1 commit into
mainfrom
fix/recall-trace-phase-coverage-2361
Jun 23, 2026
Merged

Instrument recall trace so phase metrics account for total duration (#2361)#2371
nicoloboschi merged 1 commit into
mainfrom
fix/recall-trace-phase-coverage-2361

Conversation

@nicoloboschi

Copy link
Copy Markdown
Collaborator

Summary

Fixes #2361. SearchTrace.summary.phase_metrics[] only accounted for ~10-15% of total_duration_seconds; the rest sat in un-instrumented blocks of _search_with_retries. This adds a phase metric for each missing block so the named phases now account for ~100% of the wall clock.

New phase metrics

Phase Covers Why it was missing
backend_acquisition _get_read_backend() timed inside total, never recorded
combined_scoring apply_combined_scoring + boosts + final sort + reranked-entry trace serialization the reranking metric captured step_duration before this block ran
prefer_observations_dedup Step 4.8 supersession DB query uninstrumented await
chunk_fetch obs-source + chunk SQL + tiktoken encoding no metric
result_serialization visit recording (O(n²) entry-point check) + dict conversion no metric
source_fact_fetch source-fact SQL passes no metric
entity_build entity SQL + MemoryFact/EntityState construction no metric

Diagnostic phases (excluded from the coverage sum)

Flagged details.diagnostic because they overlap or fall outside the timeline:

  • retrieval_{semantic,bm25,graph,temporal} — children of the parallel_retrieval umbrella (summing both double-counts retrieval)
  • semaphore_wait / connection_wait — surfaced as the issue requested; these waits precede/overlap other phases
  • trace_finalizefinalize() snapshots total_duration_seconds at entry, so its own serialization is necessarily after the total; reported for visibility but not part of the sum

On the proposed CI assertion

The issue's assert abs(total - named) < 0.005 can't hold as written: the phase list intentionally keeps an umbrella (parallel_retrieval) plus per-method sub-phases, and trace_finalize runs after the total snapshot. Instead, test_trace_phase_coverage sums only the non-diagnostic partition and asserts it covers total within a small tolerance and never exceeds it — a guard against both un-instrumented regressions and double-counting. Measured 100% coverage locally.

Testing

  • tests/test_search_trace.py::test_trace_phase_coverage (new) + existing test_search_with_trace pass
  • ./scripts/hooks/lint.sh, ruff, ty all clean

…2361)

_search_with_retries only recorded ~10-15% of total_duration_seconds as
named phase metrics; the rest sat in un-instrumented blocks (backend
acquisition, combined scoring, chunk/source-fact/entity enrichment,
result serialization). Add a phase metric for each, and split the
combined-scoring work out of the reranking metric (which captured its
duration before scoring ran).

Mark the per-method retrieval splits, pool waits, and trace_finalize as
diagnostic (they overlap parallel_retrieval or fall outside the total
window) so they are excluded from the coverage sum.

Adds test_trace_phase_coverage asserting the non-diagnostic phases sum to
total without exceeding it.
@nicoloboschi nicoloboschi merged commit 9e06237 into main Jun 23, 2026
92 of 98 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recall trace phase metrics don't sum to total_duration_seconds

1 participant